End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks
نویسندگان
چکیده
Speech enhancement model is used to map a noisy speech to a clean speech. In the training stage, an objective function is often adopted to optimize the model parameters. However, in most studies, there is an inconsistency between the model optimization criterion and the evaluation criterion on the enhanced speech. For example, in measuring speech intelligibility, most of the evaluation metric is based on a short-time objective intelligibility (STOI) measure, while the frame based minimum mean square error (MMSE) between estimated and clean speech is widely used in optimizing the model. Due to the inconsistency, there is no guarantee that the trained model can provide optimal performance in applications. In this study, we propose an end-to-end utterance-based speech enhancement framework using fully convolutional neural networks (FCN) to reduce the gap between the model optimization and evaluation criterion (true targets). Because of the utterance-based optimization, temporal correlation information of long speech segments, or even at the entire utterance level, can be considered when perception-based objective functions are used for the direct optimization. As an example, we implement the proposed FCN enhancement framework to optimize the STOI measure. Experimental results show that the STOI of test speech is better than conventional MMSE-optimized speech due to the consistency between the training and evaluation target. Moreover, by integrating the STOI in model optimization, the performance of the automatic speech recognition (ASR) system on the enhanced speech is also substantially improved compared to those generated by the MMSE criterion.
منابع مشابه
A Deep Model for Super-resolution Enhancement from a Single Image
This study presents a method to reconstruct a high-resolution image using a deep convolution neural network. We propose a deep model, entitled Deep Block Super Resolution (DBSR), by fusing the output features of a deep convolutional network and a shallow convolutional network. In this way, our model benefits from high frequency and low frequency features extracted from deep and shallow networks...
متن کاملFast Recurrent Fully Convolutional Networks for Direct Perception in Autonomous Driving
Deep convolutional neural networks (CNNs) have been shown to perform extremely well at a variety of tasks including subtasks of autonomous driving such as image segmentation and object classification. However, networks designed for these tasks typically require vast quantities of training data and long training periods to converge. We investigate the design rationale behind end-to-end driving n...
متن کاملA hybrid EEG-based emotion recognition approach using Wavelet Convolutional Neural Networks (WCNN) and support vector machine
Nowadays, deep learning and convolutional neural networks (CNNs) have become widespread tools in many biomedical engineering studies. CNN is an end-to-end tool which makes processing procedure integrated, but in some situations, this processing tool requires to be fused with machine learning methods to be more accurate. In this paper, a hybrid approach based on deep features extracted from Wave...
متن کامل2D-3D Fully Convolutional Neural Networks for Cardiac MR Segmentation
In this paper, we develop a 2D and 3D segmentation pipelines for fully automated cardiac MR image segmentation using Deep Convolutional Neural Networks (CNN). Our models are trained end-to-end from scratch using the ACD Challenge 2017 dataset comprising of 100 studies, each containing Cardiac MR images in End Diastole and End Systole phase. We show that both our segmentation models achieve near...
متن کاملExploring the Effectiveness of Convolutional Neural Networks for Answer Selection in End-to-End Question Answering
Most work on natural language question answering today focuses on answer selection: given a candidate list of sentences, determine which contains the answer. Although important, answer selection is only one stage in a standard end-to-end question answering pipeline. is paper explores the eectiveness of convolutional neural networks (CNNs) for answer selection in an end-to-end context using th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1709.03658 شماره
صفحات -
تاریخ انتشار 2017